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(57) Abstract 



A sprite-based coding system includes an encoder and decoder where sprite-building is automatic and segmentation of the sprite 
object is automatic and integrated into the sprite building as well as the coding process. The sprite object is distinguished from the rest 
of the video objects on basis of its motion. The sprite object moves according to the dominant component of the scene motion, which is 
usually due to camera motion or zoom. Hence, the sprite-based coding system utilizes dominant motion, to distinguish background images 
from foreground images. The sprite-based coding system is easily integrated into a video object-based coding framework such as MPEG-4 
where shape and texture of individual video objects ax coded separately. The automatic segmentation integrated in the sprite-based coding 
system identifies the shape and texture of the sprite object. 
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DESCRIPTION 



SPRITE-BASED VIDEO CODING SYSTEM 



Field of the Invention 

This invention relates to a mechanism by which a sprite (also called mosaic) is built 
automatically both in an encoder and a decoder, operating in a separate shape/texture 
coding environment such as MPEG-4. We also discuss applications that will utilize this 
technology. 

Background of the Invention 

A mosaic image (the terms mosaic and sprite will be used interchangeably) is built from 
images of a certain scene object over several video frames. For instance, a mosaic of the 
background scene in case of a panning camera will result in a panoramic image of the 
background. 5 

In MPEG-4 standardization activities, two major types of sprites and sprite-based coding 
are defined. The first type is called off-line static sprite. An off-line static sprite is a 
panoramic image which is used to produce a sequence of snapshots of the same video 
object (such as background). Each individual snapshot is generated by simply warping 
portions of the mosaic content and copying it to the video buffer where the current video 
frame is being reconstructed. Static sprites are built off-line and are transmitted as side 
information. 

The second type of mosaic is called on-line dynamic sprite. On-line dynamic sprites are 
used m predictive coding of a video object. A prediction of each snapshot of the video 
object in a sequence is obtained by warping a section of the dynamic sprite. The residual 
signal is coded and used to update the mosaic in the encoder and the decoder 
concurrently. The content of a dynamic mosaic may be constantly updated to include the 
latest video object information. As opposed to static sprites, dynamic sprites are built on 
une simultaneously in the encoder and decoder. Consequently, no additional information 
needs to be transmitted. 

Summary of the Invention 

We have described a syntax for MPEG-4 which provided a unified syntax [2] for off-line 
static sprite and on-line dynamic sprite-based coding. Our syntax also allows new modes 
that we refer as "dynamic off-line sprite-based coding," where predictive coding is 
performed on the basis of an off-line sprite (as opposed to directly copying the warped 
spate as in the case of off-line static sprites), and "on-line static sprite-based coding," 
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where the encoder and the decoder stop building the sprite further, and use it as a static 
sprite whether it is partially or fully completed. 

Both off-line static and on-line dynamic sprite-based coding require constructing a sprite. 
5 In the former case, the sprite is built prior to transmission. In the later case, the sprite is 
built on-line during the transmission. So far, MPEG-4 has assumed that the outline 
(segmentation) of the object for which the sprite is going to be built is known a-pnori at 
every time instant Although this is true in certain applications, especially in post- 
production or content generation using blue screen techniques, automatic segmentation 
1 0 should be an integral part of sprite building in general. There is therefore a need for sprite- 
based coding systems where sprite building does not require a-prion knowledge of scene 
segmentation. 

In this disclosure, we describe a sprite-based coding system (encoder and decoder) where 
15 sprite-building is automatic and segmentation of the sprite object is automatic and 
integrated into the sprite building as well as the coding process. 

We assume that the sprite object can be distinguished from the rest of the video objects on 
basis of its motion. We assume that the sprite object moves according to the dominant 
20 component of the scene motion, which is usually due to camera motion or zoom. Hence, 
our system utilizes dominant motion, which is known to those of skill in the art 

Our system is suitable for a video object-based coding framework such. as MPEG-4 [3], 
where shape and texture of individual video objects are coded separately. The automatic 
25 segmentation integrated in the described system identifies the shape and texture of the 
sprite object 

There are several possible applications of the invention: In very low bit rate applications, 
coding of video frames in terms of video objects within may be expensive, because the 
30 shape of such objects may consume a significant portion of the limited bit budget In such 
cases, our system can fallback to frame-based coding where automatic segmentation is 
only used to obtain better dominant motion estimation for sprite building and dominant 
motion-compensated prediction, as described in the "Operations" section, later herein. 

35 The described coding system has features which make it suitable for applications where 
camera view may change frequently, such as video conferencing with multiple cameras, or 
a talk show captured with more than one camera. Our system may be applied to building 
multiple sprites and using them as needed. For instance, if the camera goes back and forth 
between two participants in front of two different backgrounds, two background sprites 

40 are built and used as appropriate. More specifically, when back ground A is visible, 
building of the sprite for background B and its use in coding is suspended until 
Background B appears again. The use of multiple sprites in this fashion is possible within 
the MPEG-4 framework, as will be described in the "Operations" section. 
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The disclosed system generates a sprite during the encoding process as will be described 
later herein. However, the resulting sprite may be subsequently used, after coding, as a 
representative image of the compressed video clip. Its features can be used to identify the 
features of the video clip which can then be used in feature-based (or content-based) 
5 storage and retrieval of video clip. Hence sprite-based coding provides a natural fit to 
populating a video library of bitstreams where sprite images generated during the encoding 
process act as representative images of the video clips. Indeed the mosaics can also be 
coded using a still image coding method. Such a video library system is depicted in Fig. 5. 

10 In a similar fashion, one or several event lists may be associated with a background sprite. 
A possible choice for an event list is the set of consecutive positions of one or several 
vertices belonging to each foreground objects. Such a list can then be used to generate 
token representative image of the foreground object position in the sprite. Consecutive 
positions of each vertex could either be linked by a straight line or could share a distinct 

15 color. The consecutive positions of the vertex may be shown statically (all successive 
positions in the same sprite) or dynamically (vertex positions shown in the mosaic 
successively in time). A vertex here can be chosen to correspond to any distinctive feature 
of the foreground object, such as the center of gravity or a salient point in the shape of the 
object. In the latter case, and if several vertices are used simultaneously, the vertices might 

20 be arranged according to a hierarchical description of the object shape. With this 
technique, a user or a presentation interface has the freedom to chose between coarse to 
finer shapes to show successive foreground object positions in the background sprite. This 
concept may be used in a video library system to retrieve content based on motion 
characteristics of the foreground. 

25 

The automatic sprite building portion of the described system may be used in an off-line 
mode in a video conferencing application where the off-line sprite is built prior to 
transmission. Depiction of such a system is shown in Fig. 6. The described system can 
also generate a sprite that has a higher spatial resolution than the original images. 

30 

Brief Descriptions of the Drawings 
Fig. 1 illustrates the steps used in the method of the invention at time t-1. 
35 Fig. 2 illustrates the steps used in the method of the invention at time t to t+1. 
Fig. 3 illustrates the steps used in the method of the invention at time t+1 to t+2. 
Fig. 4 is a block diagram of the method of the invention. 

40 

Fig. 5 is a block diagram of the system of the invention. 
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Fig. 6 depicts the system and method of the invention as used in a video conferencing 
system. 

Fig. 7 depicts how consecutive portions of a foreground object may be represented in a 
mosaic according to the invention. 

5 

Detailed Description of the Preferred Embodiments 

The described method is designed to progressively learn to dissociate foreground from 
background while building a background mosaic at the same time. Steps 1 to 10 are 
1 0 repeated until the construction of the background is complete or until it is aborted. 

Assumptions 

The notations are as follows: 

15 

l(s,t) denotes the content of video frame at spatial position 5 and at time t . 

™ 0) denotes a warping operator which maps the image at time (t-1) to time 
t. For a given pixel location s Q 'm a video buffer at time t, this warping operation is 
20 performed by copying the pixel value at corresponding location sin frame (t-l).The 
correspondence between location s 0 and location s is established by a particular and well 
defined transformation such as an affine or a perspective transformation. 

3(j,r)is an indicator buffer, say for quantity x, which can be either 1 or 2 bits deep for all 
25 spatial location 5. 

Thresh is a threshold value. The operations < Thresh and > Thresh are symbolic and can 
represent complex thresholding operations. 

30 The size (per color component) of the current image frame l(s, f ) is Af , x N t and the size 
of the previous compressed/decompressed frame after warping, 
^Hi-n^C^^f-l)}). is such that it can be inscribed in a rectangular array of 
M^xA^., pixels. 

35 The sprite M[s t t)is an image intensity (texture) buffer of size M m x N m per color 
component. The field S^^^fjis a single component field of the same size. 

The construction of the sprite is started at time t. The image l(s,t -l)has already been 
compressed and decompressed and it is available in both the encoder and the decoder. 
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In the following steps, the image content is assumed to have a background and a 
foreground part (or VO) and a mosaic of the background is built - 

Step 1: Initialization. 

Referring now to Figs. 1-3, the results of the steps of the method described in the previous 
section are depicted Fig. 1 illustrates steps 0 through 11 from time M, the instant when 
mosaic building is initiated, to time t when the a new video frame or field has been 
acquired. Figs. 2 and 3 illustrate steps 2 through 11 from time t to t+1 and from time t+1 
to t+2, respectively. At the top left corner in each of these figures (A) is shown the newly 
acquired video frame which is compared to the previous video frame (next image field to 
the right)(B) once It has been compressed/de-compressed and warped (step 2). Step 3 is 
illustrated by the rightmost image field (C) in the first row of each figure. This field shows 
the area where content change has been detected. The status of the mosaic buffer is 
shown in the leftmost image field in the second row (D). This buffer is used to identify the 
new background areas as described in step 4. These areas correspond to regions where 
background was not known until now. Foreground identification is illustrated by the 
rightmost image in the second row (F). The operations associated with this image are 
described in step 5 which use the change map, the mosaic and the new background areas 
to define the foreground. Steps 6 and 7 of the method are illustrated by the two leftmost 
image fields in the third row (G, H). Here, background information comes from the 
compressed/decompressed foreground information obtained in the previous step. Finally , 
the mosaic updating process is illustrated by the bottom right image field (I). This process 
takes place in steps 8,9,10 and 1 1 of. the method. 

The binary field 3 WJfl£c (£,f) is initialized to 0 for every position 5 in the buffer, meaning 
that the content of the mosaic is unknown at these locations. 

The content of the mosaic buffer Af(s,t) is initialized to 0. 

The warping parameters from the current video frame l(s,t-l) to the mosaic is 
initialized to be W (0MM) ( ). t0 here representing an arbitrary fictive time. This initial 
warping is important as it provides a way to specify the "resolution" or the i4 time 
reference" used to build the mosaic. Potential applications of this initial mapping are 
making a mosaic with super spatial resolution or selection of an optimal time tO 
minirnizing the distortion introduced by the method. These initial warping parameters are 
transmitted to the decoder. 

Step 2: Acquisition. 

The image l(s,t) is acquired and the forward warping parameters for mapping the image 
to /(.j f f)are computed. The number of warping parameters as well as the 
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method for estimating these parameters are not specified here. A dominant motion 
estimation algorithm such as the one given in [4] may be used. The warping parameters 
are composed with the current warping parameters, resulting in the mapping W l4 _ lQ ( ). 
These parameters are transmitted to the decoder. 

Step 3: Detect Change in Content Between Previously Coded/Decoded Frame and 
Current Frame. 

i) Initialization of a large buffer of size M b xN b greater than the image 
( M b > M t ,N h >N, ) and possibly as large as the mosaic. The buffer is 2 bits deep at 
every location/The buffer is initialized to 3 to indicate unknown status. 



15 ii) Compute ( motion compensated ) scene changes over common image support Give 
label 0 to all locations where change in content is deemed small. Give label la to 
locations where change is detected to be large. To make regions more homogeneous, 
implement additional operations (e.g. morphological operations ) which either reset 
label from la to 0 or set label from 0 to la. Regions labeled. 0 will typically be 

20 considered and coded as part of the background Video Object while regions labeled 1 a 
will typically be encoded as part of the foreground Video Object 



0 if W4f-l)}J < Tkres c ^ 

la otherwise 



25 where Thres^^ denotes a pre-defined threshold value. 

iii) Tag new image regions, where support of image at time t does not overlap with 
support of image at time (t-1), as 

Step 4: Identify New Background Areas. 

A new background area is detected if there has not been any change in image content in 
35 the last two video frames. The corresponding area in the mosaic must also indicate that the 
background at this location is unknown. The resulting new background area is then 
pasted to any neighboring regions where background is known. As will be seen in later 
steps, incorporation of new background data into the mosaic must be done according to 
compressed/de-compressed background shape information to avoid any drift between 
40 encoder and decoder. 
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if ((^tO==o)&&(^ i0 (3^(,,r-l))==o)) 



1 

0 otherwise 



5 



Here, the indicator value 0 means that the background is unknown. 
Step 5: Perform Foreground/Background Segmentation. 



First look at regions where the background is known (S^fe,*- 1) = 1 ). Perform 
thresholding to distinguish the foreground from the background (case (i)). For regions 
10 where background is not known, tag as foreground any regions where changes have 
occurred (label la and lb defined in step 3) (cases (iii) and (iv) ). 

Case (ii) represents new background areas which are excluded from being part of the 
foreground. 

15 i) If ^(3^ (.,/-!)) ==1 

= kz if | I{s,t) - W t ^(M(s,t-\)) | > Thresh ft 
~ 0 otherwise 



where Thres,, is a pre-defined threshold value which is used here to segment foreground 
20 from background. 

ii) else if 3„ 4 ,(£,/)==l 

25 

iii) else if ((3^(5,/- 1)== 0) && (3 eW (i')== ^)) 
3° iv) els e ((3^^-l)==0) & &(3 ctote ( Lf )== 1 A)) 
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In cases (iii) and (iv), a sub-classification of the regions tagged as 1 into either regions la 
and lb is used for the sole purpose of providing the encoder with the flexibility to follow 
different macroblock selection rales. For example, regions tagged as la might be 
preferably coded with inter-frame macroblocks since these regions occur over common 
5 image support On the other hand, regions tagged as lb might preferably be coded with 
intra-frame macroblocks since these regions do not share a common support with the 
previous frame. 
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Step 6: Compress/Decompress Foreground Shape and Texture. 

Use conventional (I, P or B-VOP) prediction mode to encode foreground regions labeled 
as la and lb. In the case of P or B-VOPs, individual macroblocks can either use inter- 
frame prediction or intra-frame coding. The pixels corresponding to regions labeled lb 
(newly revealed background not represented in mosaic) are favored to be coded as intra 
macroblocks. The shape of the foreground is compressed and transmitted as welL Once 
de-compressed, this shape is used by the encoder and decoder to update the content of the 
mosaic. This process can be performed using the MPEG-4 VM 5.0 [3]. 

Step 7: Get Background Shape. 

Get background shape from compressed/de-compressed foreground shape. 
Compression/De-compression is necessary here to ensure that encoder and decoder share 
the same shape information. 



25 %{i.t) = 



1 if C'c{3 A fcf)«0) 

0 otherwise 



where C~'c{ }denotes shape coding/decoding which for instance can be performed as 
described in [3]. 

30 Step 8: Initialize New Background Texture in Mosaic. 

Identify regions where new background has occurred and initialize mosaic with content 
found in previous video frame (time (t-1)). Note that the field 3^ (r,r) cannot be used 
here since this information is unknown to the decoder. 



35 



M'(s t t-\)-- 



*4<-0 </ (3_kr-l)==l) 
W t0MM) (c- l c{4r-l)}) if ((^ t (3 tf tf))=l)&&(3_,tr-l)==0)) 
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Step 9: Calculate Background Texture Residuals From Mosaic Prediction. 

If % s (r, t) = = 1 , calculate difference signal by using mosaic content as predictor The 
resulting A/^f) is used to compute the difference signal over the entire macroblock 
where the pixel (s,t) is located. This difference signal is compared to conventional 
difference signals produced by using prediction from the previous and the next video 
frame (P or B prediction mode). The macroblock type is selected according to the best, 
prediction mode. Hie residual signal is transmitted to the decoder along with the 
compressed background shape as described in [2]. 



A/(, t f)=/( £ ,r)-^ w0 (A/^r-l)) 
Step 10: Mpdate Background Shape in Mosaic. 
1 5 Update mosaic map to include shape of new background. 



l '/ 3_(i,/-l)== l 

» «rf^*,(s % M) ==i) && (3_^fe, f -i)==o)) 



0 otherwise 

Step 11: Update Mosaic. 

Update content of the mosaic in regions corresponding to new or non-covered 
background in frame t 

M{s,t) = [l - aW^,(%(f,t))] M%t - 1) + 

if,t%M%t - 1) + (c-'c{A/kf)})] 

The selection of the value of the blending parameter a (0 < a < 1) in the above equation 
is application dependent. 

The method described above builds the mosaic with reference to time tO, which can be a 
30 time instant in the past or can be the current time or a future time instant. It is 
straightforward to rewrite the above equations for the case where the mosaic is 
continuously warped to the current time instant L 

Turning now to Fig. 4, a block diagram of the method is depicted. The purpose of this 
35 drawing is to highlight the dependencies in the various components and quantities used by 
the method of the invention. It also emphasizes the various warping and un-warping 
stages that are necessary to align consecutive video fields. 
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Fig. 5 shows a block diagram of the digital video database system that uses the method of 
the current invention. 

5 Fig. 6 shows a block diagram of a video conferencing system that use an off-line built 
background sprite as dynamic sprite during transmission 

Fig. 7 shows an example of how consecutive positions of a foreground object (here a car) 
may be represented in a mosaic by plotting the successive positions of one or several 
10 salient points (V) belonging to the shape of the foreground. The color of the vertices is 
changed from tO to tO+1 and from tO+1 to tO+2 to avoid any confusion. In this example, 
the vertices are shown statically in the mosaic and they capture one level of shape 
description only. 

1 5 Operation of the Various Embodiments 

A Mosaic-Based Video Conferencing and Videophone System. 

Referring now to Figs. 5 and 6, the communication protocol can include a configuration 
20 phase (time adjustable) during which an on-line background mosaic is being built During 
this period, each videophone uses the small displacements of the head and shoulder to 
build a background mosaic. The displacements of the foreground can be voluntary (system 
guides user) or not (no gain in coding efficiency if foreground does not move). In this case 
the method described above is used to build the background mosaic. During normal video 
25 transmission, the mosaic is used as a dynamic sprite and the blending factor is set to 0 to 
prevent any updating. In this case, macroblock types may be dynamic or static. In one 
extreme case, all macroblocks are static-type macroblocks meaning that the background 
mosaic is being used as a static sprite. In another extreme case, all macroblocks are of type 
dynamic and the mosaic is being used as a dynamic (predictive) sprite. This later case 
30 requires a higher data transmission bandwidth. Alternatively, a mosaic of the background 
scene can be built before the transmission and then be used as a static or a dynamic sprite 
during normal transmission session. 

A Mosaic-Based Video Database 

35 

The above method may be used in populating and searching a database of video 
bitstreams, i.e., a database of compressed bitstrearas. In such a system, video clips are 
compressed using the above method. The result is a compressed bitstream and a mosaic 
generated during the encoding process. The mosaic image can be used as a representative 
40 image of the video clip bitstream and its features can be used in indexing and retrieval of 
the bitstream belonging to that video clip. 

Furthermore, motion trajectory of the foreground can be overlaid on top of mosaic to 
provide user with rough description of foreground motion in the sequence. Trajectory of a 
45 foreground object can be represented by a set of points, each representing the position of a 
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particular feature of the foreground object at a given instant. The feature points can be 
salient vertices of the object shape. A hierarchical description of object shape would bring 
the additional advantage of allowing the database interface to overlay from coarse to fine 
shape outlines in the mosaic. Consecutive vertex positions can be shown together in the 
same background mosaic or could be displayed successively in time with the same mosaic 
support. Note that this idea provides the additional benefit of facilitating motion-based 
retrieval since the motion of the foreground is represented in mosaic reference space. 

Referring now to Fig. 7, the background mosaic is comprised of the grass, sky, sun and 
tree. The foreground object is a car subject to an accelerated motion and moving from left 
to right The shape of the car is shown in black. Eight vertices "V" have been selected to 
represent this shape. Fig. 7 shows that consecutive positions of the car can be represented 
in the mosaic by simply plotting the vertices at their successive positions. The color of the 
vertices is changed from tO to tO+1 and from tO+1 to tO+2 to avoid any possible 
confusion. In this example, the vertices are shown statically in the mosaic and they capture 
one level of shape description only. Finally, mosaic can be used as an icon. By clicking on 
the mosaic icon, user would trigger playback of the sequence. 

Support of Multiple Mosaics in Applications with Frequent Scene Changes. 

In the case where the video sequence includes rapid and frequent changes from one scene 
to another, as may be the case in video conferencing applications, it is desirable to build 
two or more (depending on how many independent scenes there are) mosaics 
simultaneously. Having more than one mosaic does not force the system to re-initiate the 
building of a new mosaic each time a scene cut occurs. In this framework, a mosaic is used 
and updated only if the video frames being encoded share similar content Note that more 
than one mosaic can be updated at a time since mosaics are allowed to overlap. 

Optimal Viewport. 

The arbitrary mapping T^ M)w0 ( ) used at the beginning of the method can be used to 
represent the optimal spatial representation domain for the mosaic where distortion and 
artifacts are minimized. While this is at this point an open problem that win require further 
study on our pan, there is little doubt that the possibility exists to find an optimal mosaic 
representation where ambiguities (parallax problems) and/or distortion are minimized 
according to pre-defined criterion. 
Improved Resolution 

Likewise, the arbitrary mapping )can include a zooming factor which has the 

effect of building a mosaic whose resolution is potentially 2,3, or N times larger than the 
resolution of the video frames used to build it The arbitrary fixed zooming factor provides 
a mechanism by which fractional warping displacements across consecutive video frames 
are recorded as integer displacements in the mosaic. The larger the zooming factor, the 
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longer the sequence must be before the mosaic can be completed (more pixel locations to 
fill up). The MPEG-4 framework allows the implementation of such a scheme. 

We denote this arbitrary mapping W m () . In the linear case, this operator is the identity 
5 matrix multiplied by a constant scalar greater than 1 . This scaling factor defines the 
enlargement factor used for the mosaic. The mosaic update equation shown in step 1 1 can 
be re-written as follows. 

mM=[i-«M^( 3 ^M)] w V-i)+ 

10 

This equation shows that the mosaic is being built at the fixed time tO which can be the 
time corresponding to the first video frame, the time corresponding to the final frame or 
any other time in between. In this case, the arbitrary mapping W ra () , is always composed 
with the warping transformation W t0t _ t . When the mosaic is continuously warped toward 
15 the current video frame, the update equation must be re- written as follows: 

«G,r) = [l -a(%[s,t))]w^_ a [M'(s,t-l))+ 

The equation above shows that the arbitrary mapping W m () is no longer composed with 
20 the frame-to-frame warping operator W lHl _ n but instead applied to the compressed/de- 
compressed residuals. In MPEG-4, the arbitrary operator W ni Q can be transmitted with 
appropriate extension of the syntax, as the first set of warping parameters, which currently 
supports only positioning the first video frame in the mosaic buffer via a translational shift. 

25 Coding of Video Sequences at Very Low Bit Rates. 

• In very low bit rate applications, the transmission of shape information may become an 
undesirable overhead. The method described above can still operate when transmission 
of shape information is turned off. This is accomplished by setting background shape 

30 to one at every pixel (step 7) and setting the blending factor a to 1 (step 1 1). The 
latter setting guarantees that the mosaic will always display the latest video 
information which is a necessity in this situation since foreground is included in the 
mosaic. In this situation, the macroblock types can be either intra, inter, static sprite or 
dynamic sprite. The sprite is being used as a static sprite if all raacroblocks are of type 

35 static. This is the most likely situation for a very low bit rate application since no 
residual is transmitted in this case. The sprite is being used as a dynamic sprite if all 
macroblocks are of type dynamic. 
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CLAIMS 

l - A method of sprite-based predictive video coding (encoding and decoding) 

where sprite-building is automatic, and segmentation of the sprite object is automatic and 
integrated into the sprite building as well as the encoding and decoding processes, 
comprising: 

initializing a binary field to zero for evsry position in a buffer, 

acquiring the image and forwarding warping parameters for the image for 

mapping; 

detecting any change in content between a previously coded/decoded frame 
and the current frame; 

identifying new background areas; 
segmenting the foreground and background; 

preparing foreground shape and texture by compressing or decompressing 
the subject shapes; 

deriving the background shape from the previously prepared foreground 

shape; 

initializing the new background texture in mosaic; 

determine the background texture residuals from the mosaic prediction; 

updating the background shape mosaic; and 

updating the mosaic in all regions corresponding to new or non-covered 

background. 

2 * A compressed video database system wherein sprites built during encoding 

are used as representative images of input video clips that can be analyzed and indexed for 
storage and retrieval purposes, comprising: 

a sprite-based encoder for receiving a video clip and generating a video 
bitstream and a mosaic; 

a feature extractor for extracting features from said mosaic and for 
identifying representative features; 

a video database generator for generating a video database from said 
representative features and said video bitstream; and 

a search engine for searching said video database for selected 
representative features. 
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